123 research outputs found

    Analysis of relative influence of nodes in directed networks

    Full text link
    Many complex networks are described by directed links; in such networks, a link represents, for example, the control of one node over the other node or unidirectional information flows. Some centrality measures are used to determine the relative importance of nodes specifically in directed networks. We analyze such a centrality measure called the influence. The influence represents the importance of nodes in various dynamics such as synchronization, evolutionary dynamics, random walk, and social dynamics. We analytically calculate the influence in various networks, including directed multipartite networks and a directed version of the Watts-Strogatz small-world network. The global properties of networks such as hierarchy and position of shortcuts, rather than local properties of the nodes, such as the degree, are shown to be the chief determinants of the influence of nodes in many cases. The developed method is also applicable to the calculation of the PageRank. We also numerically show that in a coupled oscillator system, the threshold for entrainment by a pacemaker is low when the pacemaker is placed on influential nodes. For a type of random network, the analytically derived threshold is approximately equal to the inverse of the influence. We numerically show that this relationship also holds true in a random scale-free network and a neural network.Comment: 9 figure

    A Bayesian approach to the estimation of maps between riemannian manifolds

    Full text link
    Let \Theta be a smooth compact oriented manifold without boundary, embedded in a euclidean space and let \gamma be a smooth map \Theta into a riemannian manifold \Lambda. An unknown state \theta \in \Theta is observed via X=\theta+\epsilon \xi where \epsilon>0 is a small parameter and \xi is a white Gaussian noise. For a given smooth prior on \Theta and smooth estimator g of the map \gamma we derive a second-order asymptotic expansion for the related Bayesian risk. The calculation involves the geometry of the underlying spaces \Theta and \Lambda, in particular, the integration-by-parts formula. Using this result, a second-order minimax estimator of \gamma is found based on the modern theory of harmonic maps and hypo-elliptic differential operators.Comment: 20 pages, no figures published version includes correction to eq.s 31, 41, 4

    Parameterized Algorithms for Graph Partitioning Problems

    Full text link
    We study a broad class of graph partitioning problems, where each problem is specified by a graph G=(V,E)G=(V,E), and parameters kk and pp. We seek a subset U⊆VU\subseteq V of size kk, such that α1m1+α2m2\alpha_1m_1 + \alpha_2m_2 is at most (or at least) pp, where α1,α2∈R\alpha_1,\alpha_2\in\mathbb{R} are constants defining the problem, and m1,m2m_1, m_2 are the cardinalities of the edge sets having both endpoints, and exactly one endpoint, in UU, respectively. This class of fixed cardinality graph partitioning problems (FGPP) encompasses Max (k,n−k)(k,n-k)-Cut, Min kk-Vertex Cover, kk-Densest Subgraph, and kk-Sparsest Subgraph. Our main result is an O∗(4k+o(k)Δk)O^*(4^{k+o(k)}\Delta^k) algorithm for any problem in this class, where Δ≄1\Delta \geq 1 is the maximum degree in the input graph. This resolves an open question posed by Bonnet et al. [IPEC 2013]. We obtain faster algorithms for certain subclasses of FGPPs, parameterized by pp, or by (k+p)(k+p). In particular, we give an O∗(4p+o(p))O^*(4^{p+o(p)}) time algorithm for Max (k,n−k)(k,n-k)-Cut, thus improving significantly the best known O∗(pp)O^*(p^p) time algorithm

    Cluster Editing: Kernelization based on Edge Cuts

    Full text link
    Kernelization algorithms for the {\sc cluster editing} problem have been a popular topic in the recent research in parameterized computation. Thus far most kernelization algorithms for this problem are based on the concept of {\it critical cliques}. In this paper, we present new observations and new techniques for the study of kernelization algorithms for the {\sc cluster editing} problem. Our techniques are based on the study of the relationship between {\sc cluster editing} and graph edge-cuts. As an application, we present an O(n2){\cal O}(n^2)-time algorithm that constructs a 2k2k kernel for the {\it weighted} version of the {\sc cluster editing} problem. Our result meets the best kernel size for the unweighted version for the {\sc cluster editing} problem, and significantly improves the previous best kernel of quadratic size for the weighted version of the problem

    Dynamical SimRank search on time-varying networks

    Get PDF
    SimRank is an appealing pair-wise similarity measure based on graph structure. It iteratively follows the intuition that two nodes are assessed as similar if they are pointed to by similar nodes. Many real graphs are large, and links are constantly subject to minor changes. In this article, we study the efficient dynamical computation of all-pairs SimRanks on time-varying graphs. Existing methods for the dynamical SimRank computation [e.g., LTSF (Shao et al. in PVLDB 8(8):838–849, 2015) and READS (Zhang et al. in PVLDB 10(5):601–612, 2017)] mainly focus on top-k search with respect to a given query. For all-pairs dynamical SimRank search, Li et al.’s approach (Li et al. in EDBT, 2010) was proposed for this problem. It first factorizes the graph via a singular value decomposition (SVD) and then incrementally maintains such a factorization in response to link updates at the expense of exactness. As a result, all pairs of SimRanks are updated approximately, yielding (Formula presented.) time and (Formula presented.) memory in a graph with n nodes, where r is the target rank of the low-rank SVD. Our solution to the dynamical computation of SimRank comprises of five ingredients: (1) We first consider edge update that does not accompany new node insertions. We show that the SimRank update (Formula presented.) in response to every link update is expressible as a rank-one Sylvester matrix equation. This provides an incremental method requiring (Formula presented.) time and (Formula presented.) memory in the worst case to update (Formula presented.) pairs of similarities for K iterations. (2) To speed up the computation further, we propose a lossless pruning strategy that captures the “affected areas” of (Formula presented.) to eliminate unnecessary retrieval. This reduces the time of the incremental SimRank to (Formula presented.), where m is the number of edges in the old graph, and (Formula presented.) is the size of “affected areas” in (Formula presented.), and in practice, (Formula presented.). (3) We also consider edge updates that accompany node insertions, and categorize them into three cases, according to which end of the inserted edge is a new node. For each case, we devise an efficient incremental algorithm that can support new node insertions and accurately update the affected SimRanks. (4) We next study batch updates for dynamical SimRank computation, and design an efficient batch incremental method that handles “similar sink edges” simultaneously and eliminates redundant edge updates. (5) To achieve linear memory, we devise a memory-efficient strategy that dynamically updates all pairs of SimRanks column by column in just (Formula presented.) memory, without the need to store all (Formula presented.) pairs of old SimRank scores. Experimental studies on various datasets demonstrate that our solution substantially outperforms the existing incremental SimRank methods and is faster and more memory-efficient than its competitors on million-scale graphs

    Proximity curves for potential-based clustering

    Get PDF
    YesThe concept of proximity curve and a new algorithm are proposed for obtaining clusters in a finite set of data points in the finite dimensional Euclidean space. Each point is endowed with a potential constructed by means of a multi-dimensional Cauchy density, contributing to an overall anisotropic potential function. Guided by the steepest descent algorithm, the data points are successively visited and removed one by one, and at each stage the overall potential is updated and the magnitude of its local gradient is calculated. The result is a finite sequence of tuples, the proximity curve, whose pattern is analysed to give rise to a deterministic clustering. The finite set of all such proximity curves in conjunction with a simulation study of their distribution results in a probabilistic clustering represented by a distribution on the set of dendrograms. A two-dimensional synthetic data set is used to illustrate the proposed potential-based clustering idea. It is shown that the results achieved are plausible since both the ‘geographic distribution’ of data points as well as the ‘topographic features’ imposed by the potential function are well reflected in the suggested clustering. Experiments using the Iris data set are conducted for validation purposes on classification and clustering benchmark data. The results are consistent with the proposed theoretical framework and data properties, and open new approaches and applications to consider data processing from different perspectives and interpret data attributes contribution to patterns

    Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

    Full text link
    Over the past five decades, k-means has become the clustering algorithm of choice in many application domains primarily due to its simplicity, time/space efficiency, and invariance to the ordering of the data points. Unfortunately, the algorithm's sensitivity to the initial selection of the cluster centers remains to be its most serious drawback. Numerous initialization methods have been proposed to address this drawback. Many of these methods, however, have time complexity superlinear in the number of data points, which makes them impractical for large data sets. On the other hand, linear methods are often random and/or sensitive to the ordering of the data points. These methods are generally unreliable in that the quality of their results is unpredictable. Therefore, it is common practice to perform multiple runs of such methods and take the output of the run that produces the best results. Such a practice, however, greatly increases the computational requirements of the otherwise highly efficient k-means algorithm. In this chapter, we investigate the empirical performance of six linear, deterministic (non-random), and order-invariant k-means initialization methods on a large and diverse collection of data sets from the UCI Machine Learning Repository. The results demonstrate that two relatively unknown hierarchical initialization methods due to Su and Dy outperform the remaining four methods with respect to two objective effectiveness criteria. In addition, a recent method due to Erisoglu et al. performs surprisingly poorly.Comment: 21 pages, 2 figures, 5 tables, Partitional Clustering Algorithms (Springer, 2014). arXiv admin note: substantial text overlap with arXiv:1304.7465, arXiv:1209.196

    Generalized alignment-based trace clustering of process behavior

    Get PDF
    Process mining techniques use event logs containing real process executions in order to mine, align and extend process models. The partition of an event log into trace variants facilitates the understanding and analysis of traces, so it is a common pre-processing in process mining environments. Trace clustering automates this partition; traditionally it has been applied without taking into consideration the availability of a process model. In this paper we extend our previous work on process model based trace clustering, by allowing cluster centroids to have a complex structure, that can range from a partial order, down to a subnet of the initial process model. This way, the new clustering framework presented in this paper is able to cluster together traces that are distant only due to concurrency or loop constructs in process models. We show the complexity analysis of the different instantiations of the trace clustering framework, and have implemented it in a prototype tool that has been tested on different datasets.Peer ReviewedPostprint (author's final draft

    Commonality Preserving Multiple Instance Clustering Based on Diverse Density

    Full text link
    Abstract. Image-set clustering is a problem decomposing a given im-age set into disjoint subsets satisfying specied criteria. For single vector image representations, proximity or similarity criterion is widely applied, i.e., proximal or similar images form a cluster. Recent trend of the im-age description, however, is the local feature based, i.e., an image is described by multiple local features, e.g., SIFT, SURF, and so on. In this description, which criterion should be employed for the clustering? As an answer to this question, this paper presents an image-set clus-tering method based on commonality, that is, images preserving strong commonality (coherent local features) form a cluster. In this criterion, image variations that do not affect common features are harmless. In the case of face images, hair-style changes and partial occlusions by glasses may not affect the cluster formation. We dened four commonality mea-sures based on Diverse Density, that are used in agglomerative clustering. Through comparative experiments, we conrmed that two of our meth-ods perform better than other methods examined in the experiments.
    • 

    corecore